Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Character extraction from documents using wavelet maxima

Identifieur interne : 002141 ( Main/Exploration ); précédent : 002140; suivant : 002142

Character extraction from documents using wavelet maxima

Auteurs : Wen L. Hwang [République populaire de Chine, Taïwan] ; Fu Chang [République populaire de Chine]

Source :

RBID : ISTEX:169DF10BF44F302ED8A5331A523BADAD1C8F10F9

Descripteurs français

English descriptors

Abstract

The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method.

Url:
DOI: 10.1016/S0262-8856(97)00063-2


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Character extraction from documents using wavelet maxima</title>
<author>
<name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</author>
<author>
<name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:169DF10BF44F302ED8A5331A523BADAD1C8F10F9</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1016/S0262-8856(97)00063-2</idno>
<idno type="url">https://api.istex.fr/document/169DF10BF44F302ED8A5331A523BADAD1C8F10F9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000739</idno>
<idno type="wicri:Area/Istex/Curation">000731</idno>
<idno type="wicri:Area/Istex/Checkpoint">001645</idno>
<idno type="wicri:doubleKey">0262-8856:1998:Hwang W:character:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002258</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:98-0263688</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000889</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B08</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000850</idno>
<idno type="wicri:doubleKey">0262-8856:1998:Hwang W:character:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002444</idno>
<idno type="wicri:Area/Main/Curation">002141</idno>
<idno type="wicri:Area/Main/Exploration">002141</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Character extraction from documents using wavelet maxima</title>
<author>
<name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
<affiliation wicri:level="1">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Institute of Information Science, Academia Sinica, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author>
<name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
<affiliation wicri:level="1">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Institute of Information Science, Academia Sinica, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="307">307</biblScope>
<biblScope unit="page" to="315">315</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">169DF10BF44F302ED8A5331A523BADAD1C8F10F9</idno>
<idno type="DOI">10.1016/S0262-8856(97)00063-2</idno>
<idno type="PII">S0262-8856(97)00063-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Algorithm performance</term>
<term>Character processing</term>
<term>Edge detection</term>
<term>Image processing</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Smoothing</term>
<term>Threshold</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Algorithme</term>
<term>Détection contour</term>
<term>Extraction forme</term>
<term>Lissage</term>
<term>Performance algorithme</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Seuil</term>
<term>Traitement caractère</term>
<term>Traitement image</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The extraction of character images is an important front-end processing task in optical character recognition (OCR) and other applications. This process is extremely important because OCR applications usually extract salient features and process them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noise from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs from that of noise at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of a character is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of the recovered characters is solid, containing no holes. The recovered characters tend to become fattened because of the smoothness applied in the calculation of the wavelet transform. To obtain a quality restoration of the character image, the precise locations of characters in the original image are then estimated using a Bayesian criterion. We also present some experimental results that suggest the effectiveness of our method.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
<li>Taïwan</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</noRegion>
<name sortKey="Chang, Fu" sort="Chang, Fu" uniqKey="Chang F" first="Fu" last="Chang">Fu Chang</name>
</country>
<country name="Taïwan">
<noRegion>
<name sortKey="Hwang, Wen L" sort="Hwang, Wen L" uniqKey="Hwang W" first="Wen L." last="Hwang">Wen L. Hwang</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002141 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002141 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:169DF10BF44F302ED8A5331A523BADAD1C8F10F9
   |texte=   Character extraction from documents using wavelet maxima
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024